9.1.2.1.3 无监督聚类
KMEANS
```{r eval =F}
数据标准化
data1的数据结构包含一个物种名,其他为环境因子,在kmeans构造中将物种名作为行名称;
rown <- as.character(data1[,1]) data2 <- data1[,-1] rownames(data2) <- rown
head(data1)
data3 <- scale(data2, center = TRUE, scale = TRUE)
绘制分类预测限制,确定最佳聚类水平
kmeans一般适合较低维度,此结果可能不具参考价值;
set.seed(123) library(factoextra) fviz_nbclust(data3,kmeans,method="wss")+geom_vline( xintercept = 4,linetype=2)
根据滚石图,最佳聚类可能为6,7,8
km_res6 <- kmeans(data3,6,nstart = 10,iter.max = 10) cluster6 <- km_res6$cluster
下图的结果也充分说明用此种办法建模效果并不好;
fviz_cluster(km_res6, data = data3, ellipse.type = "euclid", star.plot = TRUE, repel = TRUE, ggtheme = theme_minimal(), main="6次kmeans" )
**层次聚类**
```{r eval =F}
### 分类-层次分类
#### 层次分析聚类:####
result <- dist(data3, method = "euclidean")
#产生层次结构
result_hc <- hclust(d = result, method = "ward.D2")
#进行初步展示
fviz_dend(result_hc, cex = 0.6,horiz=TRUE)
fviz_dend(result_hc, k = 8,
cex = 0.5,
color_labels_by_k = TRUE,
rect = TRUE,horiz=TRUE
)
T-SNE
```{r eval =F}
无监督聚类-tsne
install.packages("Rtsne")
library(Rtsne)
tem 第一列有数据标签
Choose the train.csv file downloaded from the link above
Curating the database for analysis with both t-SNE and PCA
train <- tem <- data1 labels <- train$names train$label<-as.factor(train$names)
for plotting
colors = rainbow(length(unique(train$label))) names(colors) = unique(train$label)
Executing the algorithm on curated data
tsne<- Rtsne(train[,-1], dims = 2, perplexity=8, verbose=TRUE, max_iter = 500)
exeTimeTsne<- system.time(Rtsne(train[,-1], dims = 2, perplexity=10, verbose=TRUE, max_iter = 500))
Plotting
plot(tsne$Y, t='n', main="tsne") text(tsne$Y, labels=train$label, col=colors[train$label])
重新绘制:(优化)
tsnedata <- data.frame(tsne$Y,type= train$label)
names(tsnedata) <- c("tsne_1","tsne_2","class")
##head(tsnedata)
library(ggpubr)
ggscatter(tsnedata, x="tsne_1",y="tsne_2",
color ="class",size =0.5,label = "class",font.label = c(7, "plain"),
main="tSNE plot")
```